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Unbiased risk estimation, a la Stein, is studied for infinitely divisible 
laws with finite second moment 

Let US start by briefly recalling the framework and results of Stein ((Bj): Let Xi, 
i = 1, . . . , n, be iid N{0, cr^) random variables and let g = {gi, . . . , gn) ■ K." — > M", 
be "weakly differentiable." Then for aU 6* e K", 

(1) E\\X + e + g(X + e)-6\\l= no^ + E\\g{X + B)\l 

" Pi 

z— 1 * 

where || • II2 is the Euclidean norm. Thus the risk of the estimator x + g(x) can 

% 
. , dxi 

only if the variance of the risk estimate is small compared to the actual risk. This 
is especially the case if gi only depends on Xi, since then the strong law of large 
numbers kicks in. For normal random variables the existence of the above estimates 
is based on the identity 

7'(x)e-^'/2^a; = / xg(x)e-'''' 1'^ dx. 



be estimated unbiasedly by na^ + ^(x)^ + ^ ^^(x). This estimate is useful 



We obtain below a corresponding identity for infinitely divisible random variables 
with finite variance, replacing g' by K{g), where K is an operator commuting with 
translations. 

Let / be a density on R, with mean and variance (for simplicity of notation 
we concentrate on the univariate case, but see Remark|31). Let d{x) — x + g{x) be 
an estimator in the location model induced by /. Let F = {f * 6e : 9 ^ M}, while 
L'^{F) and L^{F) have their canonical meaning. We want to estimate the risk of d 
unbiasedly: 

{d{x + e)- eff{x)dx 

{x + eff{x)dx+ I x^f{x)dx + 2 I xg{x + e)f{x)dx. 
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In the above right hand side, the first summand can be estimated unbiasedly, the 
second is a constant, so we just need to find a function h e L-^{F) such that 

(2) / h{x + 9)f{x)dx = I xg{x + e)f{x)dx. 
Jr Jr 

If g is a polynomial the right-hand side of 101 is itself a polynomial in 9. It is then 

well-known that there exists an h satisfying But if g is the soft-thresholding 

operator, i.e., g{x) ~ T^{x) — {\x\ — A)+sgn(2;), then g does not even have a power 

series expansion. Moreover, h does not have to be unique. Indeed, h + q is also a 

solution for any function q such that q* f = (if /, the Fourier transform of /, has 

zeros such q might exists). 

Hence, let us assume that / does not have zeros. By computing the generalized 

Fourier transform of both sides of 

(3) J g{~x + e){-x)f{-x)dx ^ J h{~x + e)f{~x)dx, 
we get: 

(4) g{w)r{~w) = ih{w)n^w). 

This identity shows that if g converges to fast enough, e.g., if g has compact 
support, then there exists an h such that (0) holds. Since / does not vanish, h is 
uniquely determined. Hence the set 

Uf:= 

|g e L^{F) : 3h e J h{x + e)f{x)dx = J xg{x + e)f{x)dxy0 e 

is a vector space and clearly there is a unique linear map Kf : Uf — > L^{f) with 

Kf{g){x + e)f{x)dx = / g{x + e)xf{x)dx. 
: Jr 

Let us note some properties of Kf: 

Theorem 1. Let /, /i,/2 be densities with finite second moment, and let Kf, 
Kf-^ and Kf^ he well defined. Then for all b Cz and g Cz Uf, respectively g G Uf-^^:f^ : 

1. Kf{g{- + b)) = Kf{g){- + b) 

2. Kf,sA9)^Kf{g)+bg{-) 

3- Kf,,fM = KfM+Kf,{g) 

I K,f(.,){g) = Kf{g{-/b)){-b)/b, for b > 0. 
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Proof. 1. J^g{x + 6 + b)xf{x)dx = J^Kf{g){x + 6 + b)f{x)dx and hence 
Kjigi- + b))=Kjig){- + b). 
2. 



xg{x + 9) fix - b)dx ^ / iKf{g){x + 9 + b) + bg{x + 9 + b))f{x)dx 

{Kf{g){x + 9) + bg{x + 9))f{x - b)dx. 
3. let hi, /i2 be such that g{x + 9)xfi{x)dx — hi{x + 9)f{x)dx, i = 1, 2, then 

{hi+h2){z+9){h*f2){z)dz 

{hi{x + y + 9) + h2{x + y + 9))fi{x)f2{y)dxdy 

hi{x + y + 9)fi{x)dxf2{y)dy+ / h2{y + x + 9) f2{y)dy h{x)dx 

JmJm 



l{z + 9)z{h*f2){z)dz. 



4. 



g{x + 9)x{bf{bx))dx = [ g{x/b + 0)xf{x)/bdx 
Js. 

g{{x + b9)/b)x/bf{x)dx 



Kf{g{-/b))ix + be)/bf{x)dx 

Kf{g{-/bMx + 9)b)/b{bf{xb))dx, 
and thus K,j^.b)i9) = Kf{gi-/b))i-b)/b. 

Note that the third property presented above is very useful for wavelet analysis, 
since the law of the noise in a wavelet coefficient is a weighted convolution of the 
noise in the original data. 

For the normal distribution with unit variance the operator K is defined by 
K(g) = g' , i.e. K is the differentiation operator. In general K is quite complicated 
to compute, however from Q we see that formally 

Kj{^){w)=g{w)f'{-w)/{f{-w)i). 

This suggest that h can be computed by a convolution of the estimator and of a 
function or measure, which is the inverse Fourier transform of f'{—w) / {f{—w)i). Let 
us try to further formalize this claim. Assume Kf{g) := Kf * g, where Kf G L^(R) 
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and kf = f\—-)l(f{—-)i)- If 5 € L°°(R), then Kf * g does what it is supposed to 
do: 



{Kf*g){x + e)f{x)dx^ / / Kf{x-t)g{t + 9)dtfix)dx 

Jr Jr 

= 11 Kf{x-t)f{x)dxg{t + e)dt 

Jr Jr 

= 11 Kf{-{t-x))f{x)dxg{t + 9)dt 
Js. Jr 

= [ {Kf{-)*f{.mg{t + e)dt 

Jr 

= I tf{t)g{t + e)dt, 
Jm 

where the last equaUty follows from the construction of K: 

K{-)f = ^ - C/Ojid). 

t 

If Kf is known, then it can still be a problem to compute Kf{g), since Kf(g) is 
not necessarily as simple as g' . If 5 = '^^di and the Kf{gi) are easy to compute, 
then we can compute Kf{g) since Kf is linear. For example we can take g~^{x) = 
{x — A)+ and gx{x) '■= {x — A)~ as simple building blocks for functions. Note that 
Kfig+){x) = Kf{g+)ix - A) and Kf{g^{x)) = - Kf{g+) if J^xf{x)dx = 
and J^x'^f{x)dx = a^. For example the soft thresholding estimator as well as 
given by T^{x) := a;l{|a;|>A} + 2(|a;| - A/2)+sgn(x)l||^|<;^}, have the following 
decompositions: 

(5) T^{x) =x- g+{x) + g+{x) - g^ix) + gZ^ix), 

Tx{x) = x-g+{x)+ 2.9+ 2 (x) - g+{x) - g^ix) + 2gZx/^{x) - gZy^{x). 

For a further example, assume that g : M+ — > R, is twice continuously differentiable 
with 5(0) = 0, then g{x) = g'{0+)x+ + - y)+g"{y)dy. 

Another simple example is provided by compoimd Poisson distributions. Indeed, 
let be a compound Poisson distribution with Fourier transform exp(A(^'('u;) — 1)), 
where ^ is the characteristic function of the density /. Then 

and thus Kp{x) = —\f{—x)x, i.e. Kpig) = Kp * g. Since compound distributions 
are building blocks for infinitely divisible ones, we have: 

Theorem 2. Let f be an infinitely divisible density with finite second moment, 
i.e., let 

m = exp (^bt+^( -Mi^^Lll^) Midx) 



STEIN ESTIMATION 



5 



where M is a finite positive measure. Let M({0}) ^ 0, letb ~ and let g be Lipschitz. 
Then 

Jr X 

is a well defined real valued function, which is moreover bounded and continuous. 
Furthermore, 

'' K{g){x + e)f{x)dx^ j xg{x + e)f{x)dx. 

Proof. It is clear that K{g) is well defined, bounded and continuous. If g has 
compact support then \g{x + y) — g{x)\/\y\M(dy) and K{g) are in iy-'^(M) and 

K^)^t)= I I i^y±^1^9iy)M{dx)eM^ty)dy 
i- — ^ — 9{y)_ gxp(ity)(iyM((ix) 



^, , , exp(—ixt) ^ 1 , , , 

git) / — — M{dx). 



Since Jg(exp(— ixi) — l)/xM{dx) = f'{—t)/{f{—t)i), the Fourier transforms of 
J^K{g){x + 6)f{x)dx and J^g{x + 0)xf{x)dx are equal and thus these two terms 
are themselves equal for all 6* e R. 

If g is Lipschitz but does not have compact support, then let gn{x) := (1 — 
\x\/n)+g{x). Then the Lipschitz constants of the 5„ form a bounded set. Clearly, 
gn and if(g„) respectively converge pointwise, respectively to g and K(g), and 

moreover ||if(g„)||oo is bounded. Hence lim„ >oo J^9n{x + 9)xf{x)dx — S^g{x + 

9)xf[x)dx and hm„ ,00 /r K{gn){x + 9)f[x)dx = Jjj K{g){x + 9)f{x)dx, for all 9. 

Thus 

K{g){x + 9)f{x)dx ^ 1 g{x + 9)xf{x)dx. 

Remark 3. The assumption 6 = 0, is not serious, 6 is a location parameter 
of the density and thus we can use Theorem ^ The condition M{{0}) = is 
not restrictive either. If M{{Q}) — cr^, then the distribution is the convolution 
of a centered normal distribution with variance and of an infinitely divisible 
distribution with Levy measure without atom at the origin. Again we can use 
Theorem n for this situation. Hence in the general case we obtain: 

K{g){t) bg{t) + M({0})g'(<) + / S^Lt^)^^^ M (dx) . 



IR\{0} 

We also note here that although of little interest to us since we are dealing with 
mean square errors, the operator K could as well be defined just under a finite first 
moment assumption on X (in which case, the requirement on M will change too). 
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Remark 4. Let / be a density with mean zero and variance . Without loss 
of generality let also i^/(l) = 0. Let /„ = *i=i\/nf{-y/n)- By the central limit 
theorem /„ converges in distribution to a normal density. So one would expect that 
Kf^ converges in some sense to a'^d/dx. Assume that Kf{g){x) = jTg^{g{x + y) — 
g{x))/yM(dy). Note that if Kf{g) = Q * g, where Q is a measure, then with the 
notation Q^iA) Q{-A), 



{Q*g){x)^ / g{x - y) - g{x)Q{dy) 
g(x + y)-g{x) 



yQ {dy), 

y 

where the first equality holds since Kf{l) — 0, i.e. Jjg^lQ{dx) = 0. Since 
Jt^x{x + 9)f{x)dx = J^x"^ f{x)dx, taking g{x) = x gives A/(M) ~ K{x) 
and j^K{x)f(x)dx — J^x'^f{x)dx = cP' . As we already know Kf^^{g){x) 
= K f{g{- / y/n)){x^/n) / y/n. Using the form of Kf, we now have 

K- f \f \ f 9{x + y/^/^)- g{x) 

Thus, if g is Lipschitz and differentiable, lim„ >oo Kf^^{g){x) = a^g'{x). 

Examples. 1. Let f{x) — exp(—\/2|a:^|)/V2 be the variance normalized Laplace 
density. It is easy to see that, f{w) = 2/(2 + w^). Thus 

f{w) _ 2iw 
if{w) ~ 2 + ^2 ' 

and 

— '2i'i'w ^ ^ 

2 + 

Thus K{x) = — exp(— V2|2;|)sgn(a::) G L^(]R). Tedious but simple computations 
yield 

K*x+^ { ^ \ h{x). 

Using © we obtain 

K{T^{x) -x)= -h{x) - (1 - h{x)) + h{x - A) + (1 - h{x + A)) 
= h{x - A) - h{x + A). 
Combining these results, we see that for X with a Laplace distribution, 
E{T^{X + e)-9f = 1 + Emm{{X + df, A^) + 2{h{X + 9 - X) - h{X + + A)). 

2. Let ft{x) — exp(— a;)x*~^/r(t)lR+ (x) be the density of the Gamma distribution. 
Since the mean of this distribution is t we want to compute Kf^^^S-t ■ Then by Feller 
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p. 567], log(/t(a;)) = tj^{cxp{iyx) - l)/yexp{-y)dy and thus (log/t)'(a;) 
it exp{iyx) exp{—y)dy. Thus Kf^ (g) — Q * g where Q £ L^(R) and 



Q{x)=t exp{-iyx) exp{-y)dy = t exp{ixy) exp{y)dy. 

Jo J-oc 

Hence Kf^^s-t{9){x) = t j"^exp{y)g{x - y)dy - tg{x). Now, symmetrizing ft (to 
have a zero mean density) gives ft{x) = cxp{—\x\)\x\*~^ /2T{t)l^{x) from which 
the corresponding Kj^ follows. 

3. Another example is the cosine hyperbolic density, f{x) ~ 1/ cosh(7ra;/2), again 
p. 567] 

1 i7i \\ f exp{ixy) - 1 ~ iyx y 

log / X = / -dy 

Jr V exp(y) - exp(-y) 

and thus 

ir I \( \ f 9{x + y) - 9{x) y 

Kf{g){x) = / -dy. 

Jm y exp(y) - exp(-y) 

The examples presented above are infinitely divisible distributions and so K has 
a nice form. Let us consider a case which is not: The uniform distribution with 
density 2^^1(_i i). Assume g : [—1, 1] — > R and 2^^ g{x)dx = 0. If 5 is the 2- 
periodic extension of g on R then 2^^(; * 1) = 0. Thus unbiased risk estimators 
are not uniquely determined. Let 



r{e)^2-^j x{x + 0)+dx = 




12 



< -1 

e(-ia) 
> 1 



After some tries one finds that with 



, . , rO :x<0 

h{x) = I {x-[x/2]2){x-[x/2]2-2) . > Q ' 

{h is the 2-periodic extension of — a;(a; — 2)/2 defined on [0, 2] to M+.) 2~^ J^-^ h{x + 
9)dx = r{9). So with the help of (|3J| we can now compute an unbiased risk esti- 
mator for soft thresholding. Figure shows the unbiased risk estimators for soft 
thresholding with threshold 2 for the normal distribution, the Laplace distribution, 
the gamma distribution with t = 2 and the uniform distribution. The distributions 
were transformed to have unit variance and mean zero. 
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Remark 5. As we have seen with (Q, for normal random variables, unbiased 
risk estimation is possible for multivariate means, even if the estimators for the co- 
ordinates are not independent. This is also possible for other types of distributions, 
one has to apply the operator K coordinatewise. Let X^, i = 1, . . . , n be random 
variables, Xi has distribution Fi and EXi = 0, EXf = a^. Assume that an operator 
Ki exists such that EXig{Xi + Oi) = EKi{g){Xi + Oi) for some g. If ,g : M" — > R 
and 6* e R" then E(Xi + g{X + 6) - Oif = (rf + Eg{X + Of + 2EXig(X + 9). 
Then, under the proper conditions on g, 

EXig{X + 9) 

= / / Xig{xi+9i,...,Xn+9n)Fi{dx)®"^2Pr{d{x2,...,Xn)) 

= / / Ki{g{-,X2+92,...,X.n+9n)){xi+9i)Fi{dx) 

JR"-i JR 

®'l=2Ft{d{x2,...,X.n)). 

Thus E{Xi + q{X + 9)- 9if =al + Eg{X + 9f+ 2EKi{g{-,X2 + 6I2, . . . , X„ + 
9n){Xi+9i). 

Remark 6. As the reader might have guessed by now, the motivation for the 
present paper comes from thresholding methods in wavelet denoising (see ^). In 
a function space approach to denoising, the thresholds depend on the sample size 
n, on the Besov space to which the target functions belong to and also on the 
Besov norm of these targets. In practice it is often not known which threshold is 
appropriate since the function space to which the signal belongs as well as the value 
of its norm are unknown. To bypass this problem, Donoho and Johnstone developed 
a procedure called SureShrink where thresholds are chosen automatically (see jSj)- 
Their method, based on Stein's unbiased risk estimate is as follows: for each level 
(except the highest levels) in the noisy wavelet transform, the largest threshold 
(smaller than -^2 log n) which minimizes the unbiased risk estimate is chosen. For 
soft thresholding finding this minimum is simple and takes O(nlogn) time. 

As noticed in (Tl), the central limit theorem works fast for wavelet coefficients, 
so it is reasonable to apply the normal adaptive results to the general non Gaussian 
framework. However, it is also of interest to understand the scope of SureShrink 
beyond the normal framework. To do so, we needed to find unbiased risk estimates 
for other types of distributions. This is what we did here for infinitely divisible and 
related noise. We could then potentially use the corresponding thresholds found in 
H] and ig. 
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The unbiased risk estimators for soft thresholding 



